Information extraction with network centralities: finding rumor sources, measuring influence, and learning community structure

نویسنده

  • Tauhid Zaman
چکیده

Network centrality is a function that takes a network graph as input and assigns a score to each node. In this thesis, we investigate the potential of network centralities for addressing inference questions arising in the context of large-scale networked data. These questions are particularly challenging because they require algorithms which are extremely fast and simple so as to be scalable, while at the same time they must perform well. It is this tension between scalability and performance that this thesis aims to resolve by using appropriate network centralities. Specifically, we solve three important network inference problems using network centrality: finding rumor sources, measuring influence, and learning community structure. We develop a new network centrality called rumor centrality to find rumor sources in networks. We give a linear time algorithm for calculating rumor centrality, demonstrating its practicality for large networks. Rumor centrality is proven to be an exact maximum likelihood rumor source estimator for random regular graphs (under an appropriate probabilistic rumor spreading model). For a wide class of networks and rumor spreading models, we prove that it is an accurate estimator. To establish the universality of rumor centrality as a source estimator, we utilize techniques from the classical theory of generalized Polya’s urns and branching processes. Next we use rumor centrality to measure influence in Twitter. We develop an influence score based on rumor centrality which can be calculated in linear time. To justify the use of rumor centrality as the influence score, we use it to develop a new network growth model called topological network growth. We find that this model accurately reproduces two important features observed empirically in Twitter retweet networks: a power-law degree distribution and a superstar node with very high degree. Using these results, we argue that rumor centrality is correctly quantifying the influence of users on Twitter. These scores form the basis of a dynamic influence tracking engine called Trumor which allows one to measure the influence of users in Twitter or more generally in any networked data. Finally we investigate learning the community structure of a network. Using arguments based on social interactions, we determine that the network centrality known as degree centrality can be used to detect communities. We use this to develop the leader-follower 3 algorithm (LFA) which can learn the overlapping community structure in networks. The LFA runtime is linear in the network size. It is also non-parametric, in the sense that it can learn both the number and size of communities naturally from the network structure without requiring any input parameters. We prove that it is very robust and learns accurate community structure for a broad class of networks. We find that the LFA does a better job of learning community structure on real social and biological networks than more common algorithms such as spectral clustering. Thesis Supervisor: Devavrat Shah Title: Associate Professor

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CentralityDifferent Influence Models of Node Centrality in Transactional Community

This study investigates the various influence models of nodes' network centrality in the context of transactional community. Combining the Social Network Analysis (SNA) with Tobit regression, the research indicates that: i) a node's degree centrality (its followers) and betweenness centrality (the number of the shortest paths in which the node is included) have a positive impact on its network ...

متن کامل

Exponentially Twisted Sampling: a Unified Approach for Centrality Analysis in Attributed Networks

In our recent works, we developed a probabilistic framework for structural analysis in undirected networks and directed networks. The key idea of that framework is to sample a network by a symmetric and asymmetric bivariate distribution and then use that bivariate distribution to formerly defining various notions, including centrality, relative centrality, community, and modularity. The main ob...

متن کامل

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...

متن کامل

An Optimized Firefly Algorithm based on Cellular Learning Automata for Community Detection in Social Networks

The structure of the community is one of the important features of social networks. A community is a sub graph which nodes have a lot of connections to nodes of inside the community and have very few connections to nodes of outside the community. The objective of community detection is to separate groups or communities that are linked more closely. In fact, community detection is the clustering...

متن کامل

Rumor Propagation Model under Limited Information Exchange

Rumor propagation has been well studied in the past decade, the main concentration is focused on the dynamic behavior analysis of model system, but little attention is paid to the limited information exchange among nodes in the network topology. In this paper, we numerically investigate the limited information transmission influence the rumor spreading. The information packet transmission quant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011